57 research outputs found

    BEAR: Benchmarking the Efficiency of RDF Archiving

    Get PDF
    There is an emerging demand on techniques addressing the problem of efficiently archiving and (temporal) querying different versions of evolving semantic Web data. While systems archiving and/or temporal querying are still in their early days, we consider this a good time to discuss benchmarks for evaluating storage space efficiency for archives, retrieval functionality they serve, and the performance of various retrieval operations. To this end, we provide a blueprint on benchmarking archives of semantic data by defining a concise set of operators that cover the major aspects of querying of and interacting with such archives. Next, we introduce BEAR, which instantiates this blueprint to serve a concrete set of queries on the basis of real-world evolving data. Finally, we perform an empirical evaluation of current archiving techniques that is meant to serve as a first baseline of future developments on querying archives of evolving RDF data. (authors' abstract)Series: Working Papers on Information Systems, Information Business and Operation

    Iterative Learning of Relation Patterns for Market Analysis with UIMA

    Get PDF
    Blohm S, Umbrich J, Cimiano P, Sure Y. Iterative Learning of Relation Patterns for Market Analysis with UIMA. In: UIMA Workshop at GLDV Frühjahrstagung. 2007

    Evaluating Query and Storage Strategies for RDF Archives

    Get PDF
    There is an emerging demand on efficiently archiving and (temporal) querying different versions of evolving semantic Web data. As novel archiving systems are starting to address this challenge, foundations/standards for benchmarking RDF archives are needed to evaluate its storage space efficiency and the performance of different retrieval operations. To this end, we provide theoretical foundations on the design of data and queries to evaluate emerging RDF archiving systems. Then, we instantiate these foundations along a concrete set of queries on the basis of a real-world evolving dataset. Finally, we perform an empirical evaluation of various current archiving techniques and querying strategies on this data that is meant to serve as a baseline of future developments on querying archives of evolving RDF data

    PLoS One

    Get PDF
    ObjectivesTo identify the reasons patients miss taking their antiretroviral therapy (ART) and the proportion who miss their ART because of symptoms; and to explore the association between symptoms and incomplete adherence.MethodsSecondary analysis of data collected during a cross-sectional study that examined ART adherence among adults from 18 purposefully selected sites in Tanzania, Uganda, and Zambia. We interviewed 250 systematically selected patients per facility ( 6518 years) on reasons for missing ART and symptoms they had experienced (using the HIV Symptom Index). We abstracted clinical data from the patients\u2019 medical, pharmacy, and laboratory records. Incomplete adherence was defined as having missed ART for at least 48 consecutive hours during the past 3 months.ResultsTwenty-nine percent of participants reported at least one reason for having ever missed ART (1278/4425). The most frequent reason was simply forgetting (681/1278 or 53%), followed by ART-related hunger or not having enough food (30%), and symptoms (12%). The median number of symptoms reported by participants was 4 (IQR: 2\u20137). Every additional symptom increased the odds of incomplete adherence by 12% (OR: 1.1, 95% CI: 1.1\u20131.2). Female participants and participants initiated on a regimen containing stavudine were more likely to report greater numbers of symptoms.ConclusionsSymptoms were a common reason for missing ART, together with simply forgetting and food insecurity. A combination of ART regimens with fewer side effects, use of mobile phone text message reminders, and integration of food supplementation and livelihood programmes into HIV programmes, have the potential to decrease missed ART and hence to improve adherence and the outcomes of ART programmes.2016PEPFAR/United States26788919PMC4720476703

    Fast and Scalable Pattern Mining for Media-Type Focused Crawling

    Get PDF
    Search engines targeting content other than hypertext documents require a crawler that discovers resources identifying files of certain media types. Naive crawling approaches do not guarantee a sufficient supply of new URIs (Uniform Resource Identifiers) to visit; effective and scalable mechanisms for discovering and crawling targeted resources are needed. One promising approach is to use data mining techniques to identify the media type of a resource without the need for downloading the content of the resource. The idea is to use a learning approach on features derived from patterns occurring in the resource identifier. We present a focused crawler as a use case for fast and scalable data mining and discuss classification and pattern mining techniques suited for selecting resources satisfying specified media types. We show that we can process an average of 17,000 URIs/second and still detect the media type of resources with a precision of more than 80% and a recall of over 65% for all media types.peer-reviewe

    Web of Data Plumbing - Lowering the Barriers to Entry

    No full text
    Publishing and consuming content on the Web of Data often requires considerable expertise in the underlying technologies, as the expected services to achieve this are either not packaged in a simple and accessible manner, or are simply lacking. In this poster, we address selected issues by briefly introducing the following essential Web of Data services designed to lower the entry-barrier for Web developers: (i) a multi-ping service, (ii) a meta search service, and (iii) a universal discovery service

    A Hybrid Framework for Querying Linked Data Dynamically

    Get PDF
    As of today, the Web has evolved to become the largest collection of information made available by mankind. Researchers and developers are continuously working on transforming this loosely connected data collection into a giant knowledge base. As part of this trend, the Semantic Web community has started a movement to transform the Web of unstructured text into the so called \u27Web of Data\u27-a framework to create, share and reuse data by humans and machines alike across application, enterprise, and community boundaries. From this movement, Linked Data has emerged as a set of best practices to publish, connect and discover structured data on the Web using standard formats. As of today, there are over thirty billion public facts which can be accessed, reused and combined by individuals as well as organisations and companies. As the Web of Data continues to expand and diversify, it becomes more and more dynamic with data being constantly generated, removed and updated, e.g., from sensor/stream sources. New querying techniques are required to eXciently keep up with this trend. While traditional approaches facilitate fast query times by replicating Web data in optimised oYine index structures , they cannot deal eXciently with dynamic data and cannot guarantee up-to-date results. A new generation of distributed Linked Data query engines address this problem and deliver up-to-date results by retrieving query relevant data immediately before or during query execution. However fetching data at runtime from potentially hundreds or thousands of relevant Web sources is slow compared to optimised index lookups. This thesis studies and improves distributed query approaches for Linked Data and develops a hybrid query framework that oUers fresh and fast query results by combining centralised and distributed query techniques with a novel query planning approach based on knowledge about the dynamicity of data. We start by identifying the diUerent levels of dynamicity within Linked Data and highlight the challenges for centralised query approaches to deliver up-to-date results if operating over such dynamic data.We then present a study of link traversal based query execution approaches for Linked Data and show how the query performance can be improved by providing reasoning extensions.We have also developed an approximate index structure that summarises the graph-structured content of Web sources, and provide an algorithm that exploits this source summary index. Finally, we propose and evaluate a novel hybrid query engine framework that combines the execution strength of materialised query approaches with the live results from distributed query approaches. The query planning phase uses a cost-model that combines standard selectivity and novel dynamicity estimates to enable fast and fresh results
    corecore